SegFormer: A Topic Segmentation Model with Controllable Range of Attention
نویسندگان
چکیده
Topic segmentation aims to reveal the latent structure of a document and divide it into multiple parts. However, current neural solutions are limited in context modeling sentences feature representation candidate boundaries. This causes model suffer from inefficient sentence encoding noise information interference. In this paper, we design new text SegFormer with unidirectional attention blocks better representations. To alleviate problem interference, uses novel additional aggregator topic classification loss guide aggregate within appropriate range. addition, applies an iterative prediction algorithm search for optimal boundaries progressively. We evaluate SegFormer's generalization ability, multilingual application ability on challenging real-world datasets. Experiments show that our significantly improves performance by 7.5% benchmark WIKI-SECTION compared several strong baselines. The dataset separate normal advertisement segments product marketing essays also achieves superior evaluation other cutting-edge models.
منابع مشابه
Topic Segmentation with a Structured Topic Model
We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outpe...
متن کاملTopic Segmentation with an Ordering-Based Topic Model
Documents from the same domain usually discuss similar topics in a similar order. However, the number of topics and the exact topics discussed in each individual document can vary. In this paper we present a simple topic model that uses generalised Mallows models and incomplete topic orderings to incorporate this ordering regularity into the probabilistic generative process of the new model. We...
متن کاملA Hierarchical Bayesian Model for Topic Segmentation
Many streams of real-world data, such as conversations or body movements, consist of relatively coherent segments, each characterized by particular topics or controllers. Making sense of these data requires simultaneously segmenting the sequences and inferring the structure of the segments. We present a hierarchical Bayesian model that can be used to break a sequence of utterances or movements ...
متن کاملA Dynamic Topic Model for Document Segmentation
Factor language models, like Latent Semantic Analysis, represent documents as mixtures of topics, and have a variety of applications. Normally, the mixture is computed at the whole-document level, that is, the entire document contains material on several topics, without specifying where they occur in the document. In this paper, we describe a new model which computes the topic mixture estimate ...
متن کاملW3: A Controllable Brain Injury Model with a Defined Size for Evaluation of Tissue Engineered Products
لطفاً به چکیده انگلیسی مراجعه شود.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26477